Learning Rule
   HOME

TheInfoList



OR:

An
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
's learning rule or learning process is a method, mathematical logic or
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions (weights and biases) of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an
XOR gate XOR gate (sometimes EOR, or EXOR and pronounced as Exclusive OR) is a digital logic gate that gives a true (1 or HIGH) output when the number of true inputs is odd. An XOR gate implements an exclusive or (\nleftrightarrow) from mathematical log ...
or
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between ...
, or as complex as the result of a system of differential equations. The learning rule is one of the factors which decides how fast or how accurately the artificial network can be developed. Depending upon the process to develop the network there are three main models of machine learning: #
Unsupervised learning Unsupervised learning is a type of algorithm that learns patterns from untagged data. The hope is that through mimicry, which is an important mode of learning in people, the machine is forced to build a concise representation of its world and t ...
#
Supervised learning Supervised learning (SL) is a machine learning paradigm for problems where the available data consists of labelled examples, meaning that each data point contains features (covariates) and an associated label. The goal of supervised learning alg ...
#
Reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...


Background

A lot of the learning methods in machine learning work similar to each other, and are based on each other, which makes it difficult to classify them in clear categories. But they can be broadly understood in 4 categories of learning methods, though these categories don't have clear boundaries and they tend to belong to multiple categories of learning methods - #Hebbian -
Neocognitron __NOTOC__ The neocognitron is a hierarchical, multilayered artificial neural network proposed by Kunihiko Fukushima in 1979. It has been used for Japanese handwritten character recognition and other pattern recognition tasks, and served as the ins ...
, Brain-state-in-a-box #Gradient Descent - ADALINE,
Hopfield Network A Hopfield network (or Ising model of a neural network or Ising–Lenz–Little model) is a form of recurrent artificial neural network and a type of spin glass system popularised by John Hopfield in 1982 as described earlier by Little in 1974 b ...
,
Recurrent Neural Network A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes can create a cycle, allowing output from some nodes to affect subsequent input to the same nodes. This allows it to exhibit temporal dynamic ...
#Competitive - Learning Vector Quantisation, Self-Organising Feature Map, Adaptive Resonance Theory #Stochastic -
Boltzmann Machine A Boltzmann machine (also called Sherrington–Kirkpatrick model with external field or stochastic Ising–Lenz–Little model) is a stochastic spin-glass model with an external field, i.e., a Sherrington–Kirkpatrick model, that is a stochastic ...
, Cauchy Machine It is to be noted that though these learning rules might appear to be based on similar ideas, they do have subtle differences, as they are a generalisation or application over the previous rule, and hence it makes sense to study them separately based on their origins and intents.


Hebbian Learning

Developed by Donald Hebb in 1949 to describe biological neuron firing. In the mid-1950s it was also applied to computer simulations of neural networks. \Delta w_i = \eta x_i y Where \eta represents the learning rate, x_i represents the input of neuron i, and y is the output of the neuron. It has been shown that Hebb's rule in its basic form is unstable. Oja's Rule,
BCM Theory BCM theory, BCM synaptic modification, or the BCM rule, named for Elie Bienenstock, Leon Cooper, and Paul Munro, is a physical theory of learning in the visual cortex developed in 1981. The BCM model proposes a sliding threshold for long-term pote ...
are other learning rules built on top of or alongside Hebb's Rule in the study of biological neurons.


Perceptron Learning Rule (PLR)

The perceptron learning rule originates from the Hebbian assumption, and was used by
Frank Rosenblatt Frank Rosenblatt (July 11, 1928July 11, 1971) was an American psychologist notable in the field of artificial intelligence. He is sometimes called the father of deep learning. Life and career Rosenblatt was born in New Rochelle, New York as son o ...
in his perceptron in 1958. The net is passed to the activation (
transfer Transfer may refer to: Arts and media * ''Transfer'' (2010 film), a German science-fiction movie directed by Damir Lukacevic and starring Zana Marjanović * ''Transfer'' (1966 film), a short film * ''Transfer'' (journal), in management studies ...
) function and the function's output is used for adjusting the weights. The learning signal is the difference between the desired response and the actual response of a neuron. The step function is often used as an activation function, and the outputs are generally restricted to -1, 0, or 1. The weights are updated with w_\text = w_\text + \eta (t-o) x_i where "t" is the target value and "''o"'' is the output of the perceptron, and \eta is called the learning rate. The algorithm converges to the correct classification if: * the training data is linearly separable* * \eta 􏱼is sufficiently small (though smaller \eta generally means a longer learning time and more epochs) *It should also be noted that a single layer perceptron with this learning rule is incapable of working on linearly non-separable inputs, and hence the
XOR problem ''Perceptrons: an introduction to computational geometry'' is a book written by Marvin Minsky and Seymour Papert and published in 1969. An edition with handwritten corrections and additions was released in the early 1970s. An expanded edition wa ...
cannot be solved using this rule alone


Backpropagation

Seppo Linnainmaa in 1970 is said to have developed the Backpropagation Algorithm but the origins of the algorithm go back to the 1960s with many contributors. It is a generalisation of the least mean squares algorithm in the linear perceptron and the Delta Learning Rule. It implements gradient descent search through the space possible network weights, iteratively reducing the error, between the target values and the network outputs.


Widrow-Hoff Learning (Delta Learning Rule)

Similar to the perceptron learning rule but with different origin. It was developed for use in the ADALAINE network, which differs from the Perceptron mainly in terms of the training. The weights are adjusted according to the weighted sum of the inputs (the net), whereas in perceptron the sign of the weighted sum was useful for determining the output as the threshold was set to 0, -1, or +1. This makes it ADALINE different from the normal perceptron. Delta rule (DR) is similar to the Perceptron Learning Rule (PLR), with some differences: # Error (δ) in DR is not restricted to having values of 0, 1, or -1 (as in PLR), but may have any value # DR can be derived for any differentiable output/activation function f, whereas in PLR only works for threshold output function Sometimes only when the Widrow-Hoff is applied to binary targets specifically, it is referred to as Delta Rule, but the terms seem to be used often interchangeably. The delta rule is considered to a special case of the back-propagation algorithm. Delta rule also closely resembles the Rescorla-Wagner model under which Pavlovian conditioning occurs.


Competitive Learning

Competitive learning is considered a variant of
Hebbian learning Hebbian theory is a neuroscientific theory claiming that an increase in synaptic efficacy arises from a presynaptic cell's repeated and persistent stimulation of a postsynaptic cell. It is an attempt to explain synaptic plasticity, the adaptatio ...
, but it is special enough to be discussed separately. Competitive learning works by increasing the specialization of each node in the network. It is well suited to finding clusters within data. Models and algorithms based on the principle of competitive learning include
vector quantization Vector quantization (VQ) is a classical quantization technique from signal processing that allows the modeling of probability density functions by the distribution of prototype vectors. It was originally used for data compression. It works by di ...
and self-organizing maps (Kohonen maps).


See also

*
Machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
*
Decision tree learning Decision tree learning is a supervised learning approach used in statistics, data mining and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to draw conclusions about a set of ...
*
Pattern recognition Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics ...
* Bias-variance dilemma *
Bias of an estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In s ...
* Expectation–maximization algorithm


References

{{Reflist Artificial neural networks Learning